cd/entity/Power et al.ยท homeโ€บ entitiesโ€บ Power et al.
grep -l @power et al. /news/*.json | wc -l โ†’ 1

@Power et al.

mentions 1 type Person feed RSS
03:10
2026-05-20
wanglun1996.github.io
large-language-models

Evals Will Break and You Won't See It Coming

Current evaluation methods for large language models (LLMs) are fundamentally reactive and fail to anticipate qualitative shifts in capabilities, such as emergent abilities or strategic information wiโ€ฆ

// co-occurs with top 3 entities